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FULLY  THREADED  TREE  FOR  ADAPTIVE  REFINEMENT  FLUID 
DYNAMICS  SIMULATIONS 


1.  Introduction 

Adaptive  mesh  refinement  (AMR)  is  used  to  increase  spatial  and  temporal  resolution  of  nu¬ 
merical  simulations  beyond  the  limits  imposed  by  the  available  hardware.  In  fluid  dynamics,  AMR 
is  used  in  simulations  of  both  steady  and  unsteady  flows  on  structured  and  unstructured  meshes 
[1-23].  A  mesh  is  structured  if  it  is  composed  of  cells  of  rectangular  shape,  otherwise  it  is  called 
unstructured.  Integration  of  the  equations  of  fluid  dynamics  is  easier  and  the  overhead  in  computer 
time  and  memory  per  computational  cell  are  minimal  for  structured  meshes.  The  corresponding 
overhead  is  usually  high  for  unstructured  meshes.  However,  an  unstructured  mesh  has  the  advan¬ 
tage  that  it  can  conform  well  to  a  complex  boundary.  Whether  to  use  a  structured  or  unstructured 
approach  depends  on  the  specific  problem,  code  availability,  and  general  preferences.  This  paper 
deals  with  structured  meshes,  and  specifically  with  a  structured-mesh  AMR. 

There  are  two  different  approaches  to  structured-mesh  AMR.  In  one  standard  approach,  cells 
are  organized  as  two-dimensional  or  three-dimensional  arrays  (grids).  A  coarse  base  grid  represents 
the  entire  computational  domain.  Finer,  nested  grids  are  layed  over  coarser  grids  where  more 
resolution  is  required.  A  grid  hierarchy  may  be  organized  as  a  tree  [1-12].  One  advantage  of 
this  approach  is  that  any  single-grid  fluid  flow  solver  can  be  used  without  modifications  for  AMR. 
However,  grids  themselves  are  inflexible  structures.  It  is  difficult  to  cover  complex  features  of  a  flow 
with  a  few  grids.  Many  grids  are  typically  required,  and  a  substantial  number  of  computational 
cells  can  be  wasted  on  a  smooth  flow.  Several  grids  of  the  same  level  of  refinment  may  overlap 
which  leads  to  duplication  of  cells.  Periodic  rebuilding  of  the  entire  grid  hierarchy  is  required  when 
the  flow  evolves  with  time. 

In  an  alternative  approach,  individual  computational  cells  are  organized  directly  in  a  tree  [13]. 
Each  cell  can  be  refined  or  unrefined  separately  from  the  others,  as  needed.  At  every  level  of  the 
tree,  the  mesh  may  have  an  arbitrary  shape,  as  opposed  to  grids.  Duplication  of  cells  is  avoided.  To 
date,  this  approach  has  been  used  mainly  in  steady-state  fluid  flow  simulations  [13-20].  It  has  been 
applied  to  unsteady  flows  in  two  dimensions  [21-23].  The  main  advantages  of  the  tree  approach 
are  the  flexibility  in  refinment  and  unrefinement,  and  the  efficiency  in  using  cells.  However,  there 
are  certain  problems  with  the  tree  approach:  (1)  Access  to  neighboring  cells  is  more  difficult  in 
a  tree  than  in  an  array.  Tree  traversals  to  access  nearest  neighbors  are  difficult  to  vectorize  and 
parallelize.  (2)  The  memory  overhead  to  maintain  a  tree,  though  less  than  for  unstructured  meshes, 
is  still  substantial  [21-23].  Thus,  an  efficient  parallel  access  to  information  on  a  tree,  and  reduction 
of  the  memory  overhead  are  the  two  problems  that  need  to  be  solved. 

•Another  problem,  (3),  arises  in  unsteady  flow  simulations  with  many  levels  of  tree  refinement. 
In  unsteady  simulations,  buffer  layers  of  finely  refined  cells  must  be  created  in  advance  and  ahead 
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of  shocks  to  prevent  them  from  running  out  of  the  fine  mesh.  Let  the  maximum  and  minimum 
levels  of  mesh  refinement  he  /niax  and  /min.  respectively,  and  the  cell  refinement  ratio  be  a  factor 
of  two.  If  refinement  is  done  between  time  steps  [21-23],  the  thickness  of  the  buffer  layers  should 
be  at  least  2^max_^min  finest  cells.  The  number  of  required  cells  then  increases  exponentially  when 
/min  —  /max  increases,  and  becomes  prohibitively  large,  especially  in  three  dimensions,  if  /max  /min 
is  large. 

This  paper  addresses  the  problems  outlined  above.  To  alleviate  problems  (1)  and  (2),  a  new 
structure,  a  fully  threaded  tree  (FTT),  is  designed.  In  an  ordinary  tree,  each  cell  has  pointers 
to  its  children,  if  any.  The  memory  for  keeping  these  pointers  is  not  used  in  leaves  (cells  which 
do  not  have  children).  In  a  threaded  tree,  this  free  memory  is  used  to  organize  “threads”  along 
which  the  tree  can  be  quickly  traversed  during  various  search  operations.  A  discussion  of  a  binary 
threaded  tree  in  which  threads  are  used  for  left-to-right  and  right-to-left  search  can  be  found  in 
[24].  'A  binary-,  quad-,  or  an  oct-tree  is  required  to  organize  a  one-,  two-,  or  three-dimensional 
regular  mesh,  respectively.  An  implementation  of  an  oct-tree  would  require  at  least  8  words  of 
memory  per  cell.  Some  of  this  memory  can  then  be  used  to  make  threads  through  leaves  in  order  to 
facilitate  finding  the  nearest  neighbors.  However,  this  scheme  cannot  find  neighbors  of  split  cells, 
and  cannot  be  parallelized  (Section  4).  In  the  FTT  described  in  this  paper,  every  cell,  whether  a 
leaf  or  not,  has  an  easy  access  to  its  children,  neighbors  and  parents.  One  may  say  that  an  FTT  is 
a  tree  threaded  in  all  possible  directions.  At  the  same  time,  the  memory  overhead  for  supporting 
an  FTT  is  significantly  reduced.  The  maintenance  of  an  octal  FTT  requires  two  words  of  memory 
per  computational  cell  only.  Another  important  property  of  a  FTT  is  that  all  operations  on  it, 
including  modifications  of  the  tree  structure  itself,  can  be  performed  in  parallel.  This  then  allows 
vectorization  and  parallelization  of  tree-based  AMR  algorithms. 

In  this  paper,  the  FTT  is  used  for  the  integration  of  the  Euler  equations  of  fluid  dynamics.  To 
alleviate  the  third  problem  outlined  above,  the  integration  in  time  and  tree  refinement  have  been 
coupled  together,  and  time  stepping  at  different  levels  of  the  tree  and  tree  refinements  of  these 
levels  have  been  interleaved.  As  a  result,  excessive  buffer  layers  of  refinement  ahead  of  moving 
shocks  are  no  longer  needed. 

The  following  sections  describe  the  equations  solved  (Section  2),  tree  structure  and  discretiza¬ 
tion  of  the  equations  on  the  tree  (Section  3),  tree  implementation  (Section  4),  integration  of  the 
Euler  equations  (Section  5),  refinement  and  unrefinement  (Section  6),  numerical  tests  (Section  7),. 
and  FTT  performance  (Section  8). 
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2.  Equations 


In  this  paper,  the  FTT  is  used  to  solve  the  Euler  equations  for  an  inviscid  flow 

%  +  V-(/>U)  =  0 . 

at 

~  +  V  •  (/>UU)  +  VF  =  />g, 

dt 

f)F 

—  +  V-((£  +  P)U)  =  pV  •  g  , 

at 


(1) 


where  p  is  the  mass  density,  U  is  the  fluid  velocity,  E  is  the  total  energy  density,  P  =  P(E{,p)  is 
the  pressure,  E{  =  E  -  pU2 / 2  is  the  internal  energy  density,  and  g  is  an  external  acceleration. 

Numerical  integration  of  (1)  requires  evaluating  of  numerical  fluxes  of  mass,  momenta,  and 
energy  at  cell  interfaces.  We  use  a  second  order  accurate  Godunov- type  method  based  on  the 
solution  of  a  Riemann  problem  to  evaluate  the  fluxes  [26-28].  This  is  one  of  the  monotone  methods 
known  to  provide  good  results  for  both  flows  with  strong  shocks,  and  for  moderately  subsonic  and 
turbulent  flows  [32-34].  However,  the  algorithms  described  in  this  paper  are  general,  and  can  be 
used  in  conjunction  with  other  high-order  monotone  methods  such  as  the  Flux-Corrected-Transport 
[35],  for  example. 


3.  FTT  structure  and  discretization 


A  computational  domain  is  a  cube  of  size  L .  It  is  subdivided  to  a  number  of  cubic  cells  of 
various  sizes  1/2,  1/4,  1/8, ...  of  L.  Cells  are  organized  in  an  oct-tree  with  the  entire  computational 
domain  being  the  root.  For  the  integration  and  refinement  algorithms  to  be  effective,  the  following 
information  must  be  easily  accessible  for  every  cell  i : 

iLv(i)  -  level  of  the  cell  in  the  tree 

iKy(i)  -  TRUE/FALSE  if  cell  is  split /unsplit 

iPr(i)  -  pointer  to  a  parent  cell 

iCh(i,j)  -  pointers  to  children,  j  —  1  8 

iNb(i,j )  -  pointers  to  neighbors,  j  =  1  -f  6. 

The  cell  size  is  related  to  iLv(i)  by  A,*  =  2”zLv^  L. 

Cells  in  the  tree  are  either  split  (have  children)  or  leaves  (do  not  have  children).  Relations, 
between  cells  in  the  tree,  and  the  directions  of  various  pointers  are  illustrated  in  Figure  1  for  a 
one-dimensional  binary  tree.  Relations  for  a  quad-  or  an  oct-tree  are  similar.  In  Figure  1,  the 
root  (cell  1)  represents  the  entire  computational  domain.  It  has  two  children  (cells  2  and  3),  each 
representing  half  of  the  domain.  There  will  be  four  or  eight  children  in  2-D  or  3-D,  respectively. 
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Cell  2  is  further  subdivided  on  two  cells  (cells  *1  and  5).  Neighboring  leaves  are  not  allowed  to 
differ  in  size  by  more  than  a  factor  of  two.  The  neighbor-neighbor  relation  is  not  reciprocal  for 
leaves  of  different  size  that  face  each  other.  In  Figure  1,  cell  5  has  cell  3  as  its  neighbor,  but  cell 
3  has  cell  2  as  its  neighbor,  and  not  cell  5.  A  j- th  neighbor  of  a.  cell  i  either  has  the  same  size  as 
the  cell  itself.  =  A;,  or  it  is  two  times  larger.  AiNb{ij)  =  2A*.  In  the  former  case,  the 

neighbor  may  be  a  leaf  or  a  split  cell.  In  the  latter  case,  it  can  only  be  a  leaf. 

Physical  state  information  Ux  =  {pi,  (/>U)J  is  kept  in  both  split  and  unsplit  cells.  Infor¬ 
mation  kept  in  split  cells  is  needed  to  evaluate  mass,  momenta  and  energy  fluxes  across  interfaces 
between  leaves  of  different  size  (Section  5),  and  to  make  decisions  about  refinement  and  unrefine¬ 
ment  (Section  6).  The  physical  state  vectors  for  split  cells  are  updated  by  averaging  over  children, 
when  required.  The  memory  overhead  for  keeping  state  vectors  of  split  cells  is 


Overhead 


Number  of  splits 
Number  of  leaves 


111  1 
_  _J_  -  _L - [_  ~  - 

2dim  '  2-^,m  23dim  2dim  —  1  ’ 


where  dim  =  1  4-  3  is  a  number  of  spatial  dimensions.  The  overhead  is  ~  100%  in  1-D,  ~  33%  in 
2-D,  and  ~  14%  in  3-D. 

Coordinates  r,  =  {a:,-,  y*,  2,}  of  cell  centers  are  associated  with  every  cell.  It  is  assumed  that 
neighbors  1  to  6  of  a  cell  are  left  X,  right  X,  left  Y,  right  Y,  left  Z,  and  right  Z  neighbors,  respectively, 
and  that  coordinates  increase  from  left  to  right:  <  xi  <  xiNb{i,2) >  ViNb(i, 3)  <  Vi  <  UiNb(i, 4)> 

ziNb(i,h)  <  zi  <  zi Nb(ifi)-  Keeping  coordinates  is  not  necessary  for  finite-difference  operations. 
They  are  useful,  however,  for  evaluating  long-range  forces  (e.g.,  gravity)  by  direct  summation  or 
multipole  expansion  algorithms. 


4.  Implementation  of  FTT 


The  FTT,  as  described  above,  can  be  implemented  by  storing  all  pointers,  coordinates  and  flags 
in  computer  memory,  and  then  modifying  them  when  the  tree  is  refined  or  unrefined.  However, 
such  a  direct  implementation  requires  20iV  words  of  memory,  where  N  is  the  number  of  cells 
in  the  tree.  This  memory  overhead  is  too  large.  In  addition,  such  a  tree  cannot  be  modified  in 
parallel.  Because  neighbors  of  a  cell  keep  pointers  to  that  cell,  cell  removal  requires  retargeting  these 
pointers.  Removing  one  cell  thus  affects  others.  A  conflict  appears  if  a  cell  and  its  neighbors  are  to 
be  removed  simultaneously.  Similar  problems  arise  if  neighboring  cells  are  created  simultaneously. 

An  improved  parallel  version  of  the  tree  is  based  on  the  following  three  observations:  (t)  When 
a  cell  is  split,  its  eight  children  are  created  simultaneously.  Thus,  they  may  be  stored  in  memory 
contiguously  [13],  so  that  only  one  pointer  is  needed  to  find  all  eight  children.  ( ii)  Since  all  eight 
children  (siblings)  are  kept  in  memory  together,  neighbor-neighbor  relations  between  them  are 
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known  automatically.  There  is  no  need  for  pointers  between  the  siblings.  ( Hi)  Neighbors  of  any 
cell  are  either  its  siblings  or  they  are  children  of  a  neighboring  parent.  There  is  a  small  fixed  number 
of  neighbor-neighbor  relations  between  siblings  of  neighboring  parents.  Thus,  a  child’s  neighbor 
that  belongs  to  a  different  parent  can  be  determined  without  search  if  parents  keep  pointers  to 
their  own  neighbors. 

The  FTT  structure  implemented  is  illustrated  in  Figure  2.  All  cells  are  organized  in  groups 
called  octs.  Each  oct  contains  eight  cells.  Each  cell  has  a  physical  state  vector  IA  associated  with 
it,  and  a  pointer  to  an  oct  which  contains  its  children,  if  any,  or  a  nil  pointer.  Each  oct  knows  its 
level,  OctLv.  which  is  equal  to  the  level  of  the  oct’s  cells.  Each  oct  has  a  pointer  OctPr  to  a  parent 
cell.  Each  oct  has  six  pointers  OctNb( 6)  to  parent  cells  of  neighboring  octs.  This  information 
is  enough  to  find  neighbors,  children  and  parents  of  every  cell  without  searching.  The  memory 
requirement  is  16  words  per  oct  or  2  words  per  cell.  The  memory  overhead  is  thus  significantly 
reduced.  The  price  for  this  is  the  computer-time  overhead  associated  with  computing  cell  pointers 
from  oct  pointers  and  oct  pointers  from  cell  pointers,  but  this  overhead  is  relatively  small  (Section 
8). 

If  needed,  octs  may  also  contain  coordinates  OctPos  of  their  centers.  These  coordinates  are 
equal  to  coordinates  of  the  corresponding  parent  cells.  Coordinates  of  cells  that  belong  to  an  oct 
can  be  found  by  adding  or  subtracting  Ai/2  from  the  corresponding  oct’s  coordinates.  Memory 
requirement  for  coordinates  is  thus  3  words  per  oct  or  |  words  per  cell,  which  increases  the  total 
memory  requirement  to  2|  words  per  cell. 

Figure  2  shows  that  FTT  allows  parallel  modifications  at  any  given  tree  level.  To  unrefine  a 
cell,  the  associated  oct  pointed  to  by  OctCh  must  be  destroyed,  and  OctCh  must  be  set  to  nii. 
No  other  changes  are  required  in  neighboring  octs  or  cells  located  at  the  same  tree  level.  This  is 
because  neighboring  octs  are  accessed  by  parent  cells  located  at  a  different  level  of  the  tree.  To 
refine  a  cell,  a  new  oct  must  be  created  with  all  necessary  pointers,  and  the  corresponding  OctCh 
pointer  of  the  refined  cell  must  be  set  to  a  new  oct.  Again,  no  other  changes  in  any  other  octs  or 
cells  located  at  the  same  tree  level  are  required. 

All  operations  on  the  tree  can  thus  be  expressed  as  a  sequence  of  parallel  operations  on  large 
groups  of  cells  that  belong  to  the  same  tree  level.  Therefore,  it  is  possible  to  introduce  a  general 
iterator  for  performing  operations  on  these  cells,  and  to  hide  the  details  of  parallelelization  inside 
it.  This  greatly  reduces  the  amount  of  code,  and  simplifies  coding.  To  perform  a  parallel  operation, 
each  processor  first  gathers  all  required  information  in  work  arrays  using  indirect  addressing.  Then, 
the  CPU-intensive  work  is  done  by  looping  over  contigous  blocks  of  memory.  After  that,  the  result 
is  scattered  back.  In  this  way,  the  code  performance  is  virtually  unaffected  by  the  distribution  of 
cells  in  a  physical  space. 
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5.  Integration  procedure 


Equations  (1)  are  integrated  in  time  with  different  time  steps  A /(/)  at  different  levels  of  the 
tree  /;  /min  <  /  <  /niax,  where  /min  and  /niax  is  the  minimum  and  maximum  levels  of  leaves.  A 
global  time  step  A /  =  A/(/ni}„)  is  determined  from  the  CFL  condition, 


A  t  =  cfl 


2  /min  L 

max,  (max,  (a{  +  |f^,j|)) 


where  a  is  the  sound  speed,  cfl  <  1  is  a  constant,  and  maximum  in  (2)  is  taken  over  all  three 
directions  j  =  x,y,z  and  over  all  leaves  i.  Time  steps  at  various  levels  are 


At(l)  =  2/min  1  At  . 


(3) 


Integration  at  different  levels  of  the  tree  is  interleaved  with  tree  refinement.  Let  us  designate  the 
procedure  of  advancing  level  /  one  step  At(l)  in  time  as  A{1 ),  and  the  procedure  of  tree  refinement 
at  level  /  as  7 Z(l).  The  A  procedure  is  described  later  in  this  section.  The  7 Z  procedure  consists  of 
refining  leaves  of  level  /  and  unrefining  split  cells  of  level  /  according  to  certain  refinement  criteria. 
This  procedure  is  described  in  Section  6. 

A  global  time  step  of  integration  can  be  expressed  as  a  recursion, 

Global  Time  Step  =  e>(/min)  ?  (4) 

where  the  procedure  S(l)  is  a  combination  of  advancing  and  refinement  procedures 

c/i\  -vn\S  s(l  +  !);  ^(O;  s(l  + 1);  A\l);  if  /  <  /  max ^ 

if  l  —  /max  •  (  ’ 

All  procedures  in  (5)  are  performed  from  left  to  right.  The  procedures  <S(/+1)  and  A{1)  are  repeated 
twice  when  directional  time-step  splitting  is  used  in  A  (see  below).  After  every  A  procedure, 
the  sequence  of  XYZ  one-dimensional  sweeps  is  recursively  reversed.  This  is  indicated  by  the  f 
superscript.  Note  that  72  can  change  both  /max  and  /min,  recursively.  Therefore,  the  sequence  of 
advances  and  refinements  generated  changes  dynamically,  and  generally  cannot  be  predicted  from 
starting  values  of  /min  and  /max.  The  sequence  can  be  determined  in  advance  only  if  /min  aild  /max 
are  fixed  during  a  global  time  step.  For  example,  for  /min  =  6  and  Zmax  =  8  fixed,  the  sequence 
generated  would  be  [72(6)  [72( 7)  [7^(8)  .4(8)  >C(8)]  A{ 7)  [72(8)  .4(8)  -4^(8)]  >C(7)]  .4(6)  [72(7) 
[72(8)  A( 8)  ^4t(8)]  A(T)  [72(8)  >1(8)  >ff(8)]  ^(7)]  ^(6)].  Square  brackets  separate  different  levels 

of  recursion. 
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Now  we  describe  the  advance  procedure.  A(I).  The  solution  at  every  level  /  is  advanced  in 
time  using  direction  splitting.  Equations  for  the  X-sweep  can  be  written  as 


0U_  _  OT{U) 
ot  dx 


(6) 


where  the  flux  vector  is  T  -  ( pVT ,  (E  +  P)VT ,  pUff.  pUTCy ,  pCA:=  ),  and  the  source  term  is 
5  =  (  0,  pUxgx,  pgT.  0.  0  ).  The  equations  for  the  Y-  and  Z-sweeps  are  similar. 

Let  U°  and  Un  be  a  state  vector  at  the  beginning  and  at  the  end  of  a  global  time  step, 
respectively.  Let  us  designate  right  and  left  neighbors  of  cell  i  in  the  direction  of  a  sweep  as  i+ 
and  i— ,  respectively,  and  let  us  introduce  the  following  quantities 


a  =  2/min 


At 

L 


(3  =  2-dima  , 


and  7 (i,l)  = 


if  iLv(i.+)  =  /; 
if  iLv(i+)  =  l  -  1  , 


w'here  dim  =  1  -f  3  is  the  number  of  spatial  dimensions.  The  ^4(/)  procedure  can  then  be  described 
in  a  f6rm  of  the  following  pseudocode: 

for  (  leaves  i  of  level  l  -  1  )  {  if  (  i  has  split  neighbors  )  ZV”  :=  Iff;  } 
for  (  X,  Y,  Z  directions  )  { 
for  (  leaves  i  of  level  l  )  { 

if  (  i  +  is  a  leaf  or  boundary  )  { 

Compute  fluxes  at  the  interface  ; 

U?  :=U?-aFi,i+  +  £it{l)Si  ; 

K+  :=^n+  +  7^,.+; 

} 

if  (  i  —  is  a  leaf  of  level  l  —  1  or  boundary  )  { 

Compute  fluxes  at  the  ( i—,i )  interface  ;  (7) 

Uff  :=U? +  afi-,i  ; 

U?_  :=  U?_  -  pFi-y, 


} 

} 

for  (  leaves  i  of  level  l  )  {  if  (  i  has  no  neighbors  at  level  l  —  1  )  Iff  :=  Iff ;  } 

} 

9<dim 

for  (  split  cells  i  of  level  l  )  {  Iff  2-d,m  ^  ^iCh(i.j)  1  } 

j=i 

for  (  leaves  i  of  level  l  )  {  if  (  i  has  split  neighbors  )  Uf  :=  Uf;  }  . 
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The  factor  2-dini  in  /?,  7.  and  in  (7)  takes  into  account  the  difference  in  volumes  of  neighboring 
cells  of  different  sizes. 

The  algorithm  (4 )-( 7 )  is  schematically  illustrated  in  Figure  3  for  the  two-dimensional  case.  In 
this  particular  example,  the  tree  has  only  two  levels  occupied  by  leaves,  /max  =  /min  +  1*  Cells  1. 
2.  3  and  4  belong  to  level  /max.  and  cells  5  and  6  belong  to  level  /min.  During  a  time  step  at  level 
I  =  /max*  fluxes  accross  interfaces  (1*3),  (2,4),  (3,5),  and  (4,5)  are  evaluated,  new  state  vectors  Un 
for  cells  1,  2,  3  and  4  are  computed,  and  old  state  vectors  U°  in  these  cells  are  updated.  For  cell 
5,  is  changed  due  to  fluxes  T\ 3,5  and  However,  state  vector  ZV5°  is  not  updated  yet.  During 

the  second  time  step  A/(/),  this  procedure  is  repeated:  States  U® ,  ^3  ?  U\  are  updated,  U$  is 
changed  again  due  to  fluxes  J- 3t5  and  ^4,5,  but  state  vector  is  still  not  updated.  After  two  time 
steps  at  level  /,  one  time  step  at  level  /  -  1  is  performed.  Fluxes  at  interface  (5,6)  are  computed, 
U?  is  changed  due  to  these  fluxes,  and  the  state  vector  is  finally  updated. 

For  leaves  that  have  neighbors  at  the  same  tree  level,  the  procedure  is  that  of  usual  conservative 
finite  differencing 


U T  =  U°  + 


A  t(l) 
A  X{ 


(J- f) 


(8) 


with  left  and  right  fluxes  evaluated  at  the  same  time  t  =  t°  +  At(l)/2 .  For  leaves  that  have  split 
neighbors,  flux  contributions  from  more  than  two  neighbors  at  two  different  moments  of  time, 
t  =  t°  +  At(l)/2  and  t  =  t°  +  3Af(/)/2,  are  taken  into  account,  again  in  a  conservative  fashion. 

Fluxes  are  obtained  by  solving  a  Riemann  problem  at  every  interface  according  to  [26].  Left 
and  right  states  for  the  Riemann  problems  are  obtained  by  a  piecewise  linear  reconstruction  [27] 
from  U°  state  vectors.  A  small  amount  of  dissipation  is  introduced  into  the  code  as  an  artificial 
diffusion  added  to  the  numerical  fluxes  [28].  For  cells  that  have  neighbors  at  the  same  tree  level, 
the  entire  integration  procedure  is  formally  second  order  accurate  both  in  space  and  in  time.  For 
cells  that  have  neighboring  leaves  at  different  levels,  the  accuracy  reduces  to  first  order. 

If  necessary,  accelerations  g  are  computed  in  the  beginning  of  every  advance  procedure.  To 
maintain  the  second  order  accuracy  in  time  for  the  source  term  <S,  the  state  U°  is  corrected  after 
every  advance  procedure  as 


—  g*  _  U2  —  U2 

U  :=  U  +  ^ - - — —  At  ;  E  :=  E  +  p - - - 


U  :=U  . 


6.  Mesh  refinement 

The  most  difficult  part  of  an  AMR  simulation  is  to  decide  where  and  when  to  refine  or  unrefine 
a  mesh.  It  is  also  the  most  problem-dependent  part.  Different  problems  may  require  different 
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criteria  for  refinement  and  unrefineinent.  The  refinement  procedure  currently  implemented  consists 
of  four  steps: 

1.  For  every  cell,  a  refinement  indicator.  0  <  £  <  1  is  computed.  Large  £  >  fsp|jt  indicates  that  a 
leaf  must  be  refined,  and  small  £  <  £j0in  indicates  that  a  split  cell  can  be  unrefined:  £sp\\t  and 
Cjoin  are  some  predefined  constant  values. 

2.  £  is  smoothed  in  order  to  prevent  cells  from  being  falsely  refined  (mesh  trashing)  in  places 
where  £  fluctuates  around  critical  values  £sp|jt  and  £j0in- 

3.  Leaves  are  refined  if  £  >  £spiit- 

4.  Split  cells  are  unrefined  if  £  <  £i0in  and  if  they  are  not  just  split. 

There  are  two  approaches  to  computing  £.  First,  it  may  be  computed  to  measure  the  con¬ 
vergence  of  a  solution  [2].  This  allows  to  control  the  accuracy  of  the  solution  “on  flight,”  but 
requires  that  the  solutions  on  the  fine  and  coarse  meshes  be  generated  simultaneously.  Another 
approach  is  to  use  an  indicator  proportional  to  gradients  in  the  solution.  Such  an  indicator  shows 
where  to  expect  a  large  error  in  the  solution  [16,19,20].  The  convergence  of  the  solution  must  be 
checked  afterwards  with  a  different  resolution.  We  adopt  the  second  approach,  but  note  that  the 
tree  structure  allows  both  approaches  to  be  implemented. 

The  indicator  £  is  constructed  as  a  maximum  of  several  indicators, 

£  =  max^1,^2,...)  ,  (10) 


each  of  which  is  either  a  shock  indicator,  contact  discontinuity  indicator,  or  a  gradient  indicator, 
all  normalized  to  unity,  0  <  £k  <  1. 

As  a  shock  indicator,  the  following  quantity  is  used  [28] 


£?  =  max  •  , 


j  =  1-7-6 


£ ?  -  = 


15  if  ,),/>,)  >  €s  an<^  ^  '  ( UiNb(i,j),k  Ui,k)  >  0; 

0,  otherwise; 


(11) 


where  6  =  1  —  2  -mod(j,  2),  k  =  int(j/2),  and  es  =  0.2  determines  the  minimum  shock  strength  to 
be  detected. 

A  discontinuity  indicator  is  defined  as  follows: 


=  ma xjlj  , 


£c  •  = 


1, 


min  (P,Nb(>.n>P<) 


<  es  and 


min (p,Nb(,,j),P,)  c  ' 


0,  otherwise; 


(12) 


9 


where  (c  =  0.2. 

A  gradient  indicator  for  a  variable  a  is  constructed  as 


£.a  =  max 
j=i-o 


/  Ikb.vM /,?)!  ~  kill  \ 
\max( |«i4v6(i,j)|?  \(li\) ) 


(13) 


where  a  may  be  a  mass  density,  energy  density,  pressure,  velocity,  vorticity.  etc. 

As  explained  in  the  beginning  of  this  section,  the  indicator  £  computed  according  to  (10)  - 
(13)  must  be  smoothed  before  it  is  used  for  refinement.  Smoothing  is  based  on  the  analogy  with 
the  propagation  of  a  reaction-diffusion  front.  Let  us  consider  f  as  a  concentration  of  a  reactant 
obeying  a  reaction-diffusion  equation 

§  =  /i-v2e  +  e,  (14) 

at 

where  t  is  a  fiducial  time,  K  =  2 ~2lL2  is  a  constant  diffusion  coefficient,  and 

(  1  ■>  if  1  >  £  >  £spiit  5 

Q  =  (15) 

1 0,  otherwise; 


is  a  reaction  rate.  A  steady-state  form  of  (14), 


d£ 

^  dx 


dx2 


+  Q 


describes  a  reaction  front  which  moves  with  a  constant  speed 


(16) 


Si  =  2~‘Ly/^,  (17) 

and  has  a  thickness 

Si  ~2~‘L/y/^.  (18) 

As  the  front  moves,  it  leaves  behind  cells  marked  with  £  =  1. 

The  values  of  £  from  (10)  are  used  as  the  initial  condition  for  (14).  The  equation  (14)  then 
describes  a  reaction  front  which  starts  at  places  where  £  >  £Spiit  *  and  propagates  outwards  normal 
to  itself  with  the  speed  S$.  By  integrating  (14),  this  front  is  advanced  approximately  2-3 
computational  cells.  According  to  the  Huygens  principle,  the  front  curvature  tends  to  decrease  with 
time,  so  that  boundaries  of  regions  marked  for  refinement  become  smoother.  Diffusion  prevails  over 
reaction  in  isolated  areas  with  £  >  £spnt  if  these  areas  have  a  size  less  than  ~  6 These  small  regions 
do  not  trigger  a  reaction  front,  and  disappear  under  the  refinement  threshold  (‘‘extinguish”).  This 
reduces  the  numerical  noise  in  selecting  cells  for  refinement  and  unrefinement.  To  further  reduce 
mesh  trashing,  split  cells  are  not  joined  if  this  produces  an  isolated  leaf. 
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7.  Numerical  examples 


Many  numerical  tests  including  advection.  shock  propagation  and  interaction  in  one,  two  and 
three  dimensions  were  used  to  validate  the  algorithms.  Here  we  show  some  test  examples  which 
illustrate  adaptive  mesh  refinement  aspects  of  the  algorithm's  performance. 

7.1  Isolated  shock  leave  passing  through  a  coarse -to- fine  interface 

Figure  4  shows  a  computation  of  an  isolated  M  —  10  strong  shock  in  an  ideal  gas  with  7  =  1.4. 
In  all  three  panels,  the  shock  is  moving  from  left  to  right.  The  upper  panel  shows  the  shock  profile 
at  one  time  during  the  shock  propagation  through  a  uniform  mesh.  The  shock  profile  is  two  cells 
wide,  and  oscillations  free.  When  the  shock  passes  from  a  fine  to  a  coarse  mesh  (middle  panel),  the 
amplitude  of  disturbances  generated  is  less  than  1%.  When  passing  from  a  coarse  to  a  fine  mesh 
(low  panel),  two  types  of  disturbances,  acoustic  and  entropy  waves,  are  generated  at  the  ~  5% 
amplitude. 

A  generation  of  spurious  disturbances  by  a  shock  passing  from  a  coarse  to  a  fine  mesh  is  a 
known  problem  [2,4,16].  An  illustrative  example  for  a  M  =  10  shock  in  an  ideal  gas  with  7  =  1.4  is 
presented  by  Berger  and  Colella  [2].  Berger  and  Colella  argue  that  the  disturbances  are  caused  by 
the  0(1)  errors  always  present  in  the  numerical  fluxes  in  the  vicinity  of  a  strong  shock.  The  errors 
do  not  cancel  each  other  if  mesh  spacing  is  changing.  Formally,  Berger  and  Colella’s  integration 
was  second-order  accurate  at  interfaces.  In  our  case,  the  formal  order  of  integration  at  interfaces 
is  first  order.  However,  our  example  shows  smaller  disturbances,  which  could  be  attributed  to  the 
smaller  refinement  ratio  (2:1  instead  of  4:1).  The  main  conclusion  from  this  test  is  that  care  must 
be  taken  not  to  let  a  strong  shock  enter  a  more  refined  mesh.  Strong  shocks  should  be  either 
refined  all  of  the  time,  or  not  refined  at  all. 

7.2  Density  discontinuity  passing  through  a  coarse-to-fine  interface 

Figure  5  shows  the  advection  of  a  slab  of  high  density,  cold  gas  surrounded  by  a  hot  gas  of 
lower  density.  Initial  conditions  for  this  problem  are  P  =  0.01  and  V  —  2  everywhere,  pco\d  =  3, 
phol  =  1.  Initially,  the  slab  is  20  cells  wide.  The  equation  of  state  is  that  of  an  ideal  gas  with 
7  =  1.4  constant.  Integration  is  done  with  cfl  =  0.7.  The  only  result  of  passing  a  slab  through 
coarse-to-fine  interfaces  is  changes  in  a.  number  of  cells  representing  contact  discontinuities.  The 
pressure  and  velocity  field  (not  shown)  both  remain  constant  to  machine  accuracy.  The  test  shows 
that  passing  a  discontinuity  through  coarse-to-fine  interfaces  does  not  generate  any  disturbances. 
However,  if  a  discontinuity  exits  a  refined  area,  the  resolution  is  lost.  If  it  enters  a  refined  area,  its 
width  does  not  decrease,  and  no  gain  in  resolution  is  obtained. 


11 


7.3  Planar  strong  point  explosion 


Figures  (i  and  7  shows  a  one-dimensional  (planar)  strong  point  explosion  in  an  ideal  gas  with 
7  =  1.4.  Initial  conditions  are  po  =  1,  Po  =  10~3  everywhere.  The  explosion  energy  E  =  2.5  x  109 
is  put  in  a  single  cell  by  increasing  the  pressure  in  that  cell  to  P  =  Po  +  (7  “  1)P/A.  where  A  is 
the  cell  size.  Figure  6  shows  the  explosion  computed  on  a  uniform  mesh  with  A  =  jrj.  The  exact 
solution  to  the  problem  is  shown  for  comparison  [29].  The  solutions  agree  near  the  center.  The 
difficulty  is  to  reproduce  the  solution  near  the  shock. 

Figure  7  shows  the  same  explosion  computed  with  eight  levels  of  refinement,  /  =  5  -r  12.  Two 
refinement  indicators  were  used,  one  for  shocks  (11)  and  another  one  for  the  pressure  gradient 
(13).  The  refinement  criteria  were  £sp|jt  =  0.5  and  fj0in  =  0.01.  In  this  case,  the  flow  behind  the 
shock  has  been  resolved.  The  number  of  cells  used  in  the  simulation  was  ~  250,  and  only  a  few 
(<  20.)  located  at  the  highest  level  /  =  12.  A  uniform  4096  mesh  would  be  required  to  get  the 
same  resolution.  Small  disturbances  at  fine-coarse  interfaces  are  seen  at  the  pressure  and  velocity 
plots.  Since  the  gradients  in  the  solution  are  large  everywhere,  these  disturbances  may  be  caused 
by  the  same  inexact  cancelling  of  numerical  errors  in  fluxes  at  the  fine-coarse  interfaces  discussed 
in  [2]  and  in  Section  7.1  above,  or  they  may  be  the  result  of  decreasing  the  order  of  the  accuracy 
of  integration  at  interfaces,  or  both.  Which  effect  is  important,  and  whether  it  is  worthwhile  to 
maintain  formal  second-order  accuracy  at  interfaces,  requires  further  study. 

7.4  Cylindrical  strong  point  explosion  in  a  box 

Figures  8  and  9  show  a  cylindrical  strong  point  explosion  in  a  square  box,  and  illustrate  how 
the  adaptive  mesh  algorithm  treats  a  very  complicated,  rapidly  evolving  flow  with  multiple  shocks, 
contact  discontinuities,  and  vortices.  Initial  conditions  in  the  box  are  p0  =  P0  =  1.  The  equation 
of  state  is  that  of  an  ideal  gas  with  7  =  1.4.  The  explosion  energy  E  =  105  is  put  in  a  single  cell 
of  size  “JotJ  located  at  x  —  0.35,  y  =  0.2.  The  computation  was  done  with  six  levels  of  refinement, 

/  =  5  -r-  10,  using  shock  and  pressure  gradient  refinement  criteria  with  £spKt  =  0.5  and  ^Q\n  =  0.05. 
The  Courant  number,  c/7,  is  0.7.  The  equivalent  uniform  resolution  would  require  a  1024  X  1024 
grid  or  220  =  1,048,576  cells. 

A  cylindrical  blast  wave  propagates  from  the  explosion  point,  and  first  hits  the  lower,  then  the 
left,  right  and  upper  walls  of  the  box.  Reflections  of  the  primary  shock  begin  as  regular  reflections, 
but  later  transform  into  Mach  reflections.  The  sound  speed  is  very  high  in  the  material  behind  the 
primary  shock.  As  a  result,  reflected  shocks  quickly  engulf  the  inside  of  the  hot  bubble  created  by 
the  initial  blast,  catch  up  with  the  primary  shock,  and  interact  with  walls  and  with  each  other. 
Figures  8a  (step  200)  and  8b  (step  300)  show  the  interaction  of  the  primary  shock  with  the  lower 
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wall.  In  Figure  8c  (step  400)  this  interaction  continues,  but  the  shock  is  also  reflecting  off  the  left 
wall.  Reflections  of  the  primary  shock  off  the  walls  are  irregular.  In  the  next  Figure  8d  (step  500), 
a  part  of  the  primary  shock  is  still  interacting  with  the  upper  and  right  walls.  In  Figure  8e  (step 
600),  the  primary  shock  disappeared,  and  the  collision  of  the  two  reflected  shocks  just  occurred 
near  the  upper  right  corner  of  the  box.  Figure  8f  shows  the  pressure  contours  for  step  600.  The 
mesh  for  the  two  time  steps,  400  and  600,  is  shown  in  Figure  9.  Figures  8  and  9  show  a  complicated 
pattern  of  secondary  shocks  resolved  by  the  mesh  refinement  algorithm. 

7.5  Shock  -  bubble  interaction  in  three  dimensions 

The  interaction  of  shock  waves  with  fluid  inhomogeneities  is  an  important  mechanism  of 
vorticity  generation  [30,31],  The  interaction  of  a  planar  shock  with  a  low-density  spherical  bubble 
of  gas  is  one  of  the  four  representative  cases  studied  theoretically  in  [31].  It  is  assumed  in  [31]  that 
the  gas  in  the  bubble  is  in  pressure  equilibrium  with  the  ambient  gas,  but  has  higher  temperature. 
This  is  different  from  the  conditions  of  the  experiments  [30]  where  the  density  contrast  was  created 
by  using  gases  with  different  molecular  weight.  Formulation  of  the  problem  in  [31]  is  highly  relevant 
to  the  shock-generated  vorticity  in  flames. 

The  computational  setting  for  the  shock  -  bubble  problem  is  shown  in  Figure  10.  A  spherical 
bubble  of  hot,  low  density  gas  is  located  inside  a  rectangular,  solid  wall  tube  of  length  1  and 
cross-section  j  x  The  computational  domain  of  size  Lx  x  Ly  x  Lz  =  1  x  j  x  \  represents  a 
quarter  of  the  tube  cross-section  assuming  symmetry  with  respect  to  the  middle  planes  y  —  \  and 
z  =  j.  The  radius  of  the  bubble  is  Ri  =  |.  The  bubble  is  centered  at  a  distance  =  \  from  the 
tube  entrance.  The  bubble  and  the  background  gas  are  in  the  pressure  equilibrium  at  a  common 
pressure  P0  =  1-  The  background  density  is  po  =  1  and  the  bubble  density  is  pi  =  0.166.  The 
equation  of  state  is  that  of  an  ideal  gas  with  constant  7  =  1.4.  A  shock  wave  with  the  Mach  number 
M  —  1.25  hits  the  bubble  from  the  left.  Boundary  conditions  are  inflow  on  the  tube  entrance,  and 
outflow  on  the  tube  exit.  Simulations  are  performed  using  five  levels  of  refinement  /  =  5  -r  9.  The 
maximum  equivalent  uniform  resolution  would  correspond  to  a  512  x  128  x  128  grid,  and  would 
require  223  =  8,388,608  cells  of  size  The  three-dimensional  resolution  in  this  test  case  is  20% 
higher  than  that  used  in  two-dimensional  simulations  [31].  Four  refinement  indicators  were  used: 
for  shocks  (15),  for  contact  discontinuities  (16)  and  for  density  and  pressure  gradients  (17).  The 
refinement  criteria  are  £spHt  =  0.5  and  £j0in  =  0.05.  The  Courant  number  cfl  =  0.7  was  used. 

Figure  11  shows  the  density  contours  and  the  mesh  in  the  XY  plane  z  =  ~  after  20  and  70  global 
time  steps  of  integration.  Figure  12  shows  the  density  contour  and  the  mesh  for  the  same  time 
steps  in  the  perpendicular  YZ-planes  x  =  0.35  and  x  —  0.6,  respectively.  The  shock  compresses  the 
bubble,  and  vorticity  is  generated  due  to  misalignment  of  the  pressure  and  density  gradients  during 
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the  interaction.  Vorticity  is  generated  mainly  at  the  bubble  surface  where  transmitted  shocks  are 
refracted.  Rotational  motions  generated  as  a  result  of  shock  refraction  further  deform  the  bubble, 
and  eventually  transform  it  into  the  rotating  ring.  The  results  of  the  simulation  presented  here 
are  in  a  good  general  agreement  with  the  results  of  two-dimensional  axially-symmetric  simulations 
[31].  Three-dimensional  simulations,  however,  allow  assumption  of  axial  symmetry  to  be  relaxed. 
Figure  12  shows  the  development  of  azimuthal  perturbations  on  the  surface  of  the  bubble  as  time 
progresses.  These  perturbations  are  initially  caused  by  the  the  Ravleigh-Taylor  instability  driven 
by  the  deceleration  of  material  between  the  bubble  and  the  spherical  transmitted  shock.  These 
perturbations  are  further  subjected  to  the  Kelvin- Helmholtz  instability  which  develops  due  to  the 
presence  of  the  velocity  component  tangent  to  the  bubble  surface.  The  tangential  component  has 
been  generated  during  the  shock  refraction  at  the  bubble  surface.  A  discussion  of  instabilities 
developing  during  the  shock-bubble  interaction  is  out  of  scope  of  this  paper,  and  will  be  presented 
elsewhere. 


8.  Algorithm  performance 

The  algorithm  consists  of  four  parts:  (1)  FTT  subroutines  for  tree  modifications,  dumping, 
performing  global  parallel  operations,  finding  neighbors,  parents  and  children,  etc.;  (2)  Fluid  dy¬ 
namics  subroutines;  (3)  Refinement  subroutines;  (4)  Service  subroutines:  driver,  print,  plot,  etc. 
The  code  is  written  entirely  in  Fortran  77.  The  line  count  for  each  part  is  given  in  Table  1  (com¬ 
ments  included).  The  critical  FTT  part  on  which  the  rest  of  the  algorithm  is  built  contains  less 
than  800  lines. 

Fourth  column  of  Table  1  shows  the  breakdown  of  computer  time  for  different  parts  of  the 
algorithm  compiled  using  a  Sun  f77  compiler  with  -05  optimization,  and  run  on  a  Sun  workstation 
under  Solaris  operating  system.  The  figures  show  that  the  FTT  part  of  the  code  consumes  a 
relatively  small  fraction  of  time.  The  fifth  column  gives  the  time  breakdown  for  the  same  code 
compiled  using  Cray  cf77  compiler  with  -Zu  parallel  option  enabled,  and  run  on  a  single  Cray 
J90  processor.  On  a  vector  machine,  the  relative  overhead  for  the  FTT  is  noticeably  larger, 
and  comprises  about  1/3  of  the  total  time.  This  is  because  FTT  uses  indirect  addressing  which 
slows  vector  operations.  In  the  fluid  dynamics  part  of  the  code,  work  is  done  by  looping  over 
contigous  blocks  of  memory,  and  all  loops  are  effectively  vectorized.  The  J90  computers  lack 
hardware  gather/scatter  capabilities,  have  only  two  paths  between  the  vector  processor  unit  and 
the  memory,  and  have  a  rather  long  memory  latency.  The  FTT  part  will  run  more  effectively  on 
Cray  C90  or  T90  computers.  On  a  single  processor  of  Cray  J90,  the  computer  time  to  advance 
one  cell  one  sub-step  is  30  //sec  per  dimension.  In  a  multi-processor  mode,  the  code  used  up  to  10 
processors  of  a  16  processor  J90  in  a  batch  environment. 
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Table  1 


algorithm's  part 

lines 

time,  %  (SUN) 

time,  %  (J90) 

FTT 

800 

11 

35 

Hydro 

1-185 

83 

61 

Refinement 

601 

3 

2 

Services 

626 

3 

2 

Total 

3512 

100 

100 

For  the  cylindrical  strong  point  explosion,  and  for  the  shock-bubble  interaction  tests,  Figures  13 
and  14  show  the  ratio  of  the  number  of  computational  cells  actually  used  in  the  simulation  to  the 
number  of  cells  required  to  achieve  an  equivalent  uniform  resolution.  The  ratio  varies  with  time. 
For  the  cylindrical  shock  test,  for  example,  it  is  almost  zero  in  the  beginning,  when  the  shocked 
material  occupies  only  a  small  fraction  of  the  computational  domain.  Then  the  ratio  grows  with 
time  almost  linearly  because  fine  cells  tend  to  concentrate  around  the  surface  of  the  shock  only. 
At  the  end  of  the  simulation,  the  ratio  reaches  its  highest  value  ~  0.1.  At  this  time  the  entire 
computational  domain  is  filled  with  shocks,  and  the  regions  of  constant  flow  have  disappeared.  On 
average,  the  ratios  for  both  simulations  are  ~  0.05  —  0.07  which  represents  a  factor  of  ~  15  savings 
in  memory  and  computer  time  compared  to  a  brute  force  uniform  grid  computations.  Here  we 
have  taken  into  account  that  the  FTT  overhead  would  be  absent  in  grid-based  computations. 

9.  Conclusions 

A  fully  threaded  tree  structure  (FTT)  for  adaptive  refinement  of  regular  meshes  has  been 
described.  The  FTT  allows  all  operations  and  modifications  of  the  tree  to  be  performed  in  parallel, 
which  makes  this  structure  well  suited  for  the  use  on  vector  and  parallel  computers.  A  filtering 
algorithm  has  been  described  for  removing  high-frequency  noise  in  mesh  refinement. 

The  FTT  has  been  applied  to  the  integration  of  the  Euler  equations  of  fluid  dynamics.  Time 
stepping,  and  mesh  refinement  algorithms  specific  to  the  integration  of  the  Euler  equations  were 
described.  The  integration  in  time  is  done  using  different  time  steps  at  different  tree  levels.  The 
integration  and  mesh  refinement  are  interleaved  to  avoid  creation  of  buffer  layers  of  fine  mesh  ahead 
of  moving  shocks  and  other  discontinuities.  The  time  stepping  algorithm  described  can  utilize  any 
method  of  evaluating  numerical  fluxes  which  is  at  least  second  order  accurate  in  space  and  time 
on  a  uniform  mesh. 

The  FTT  performance  on  a  Sun  workstation,  and  on  a  shared  memory  Cray  J90  computer 
is  reported.  The  FTT  has  low  memory  and  computer  time  overheads.  2  words  and  <  10  -  40% 
of  computer  time  per  computational  cell,  respectively.  A  factor  ~  15  savings  in  memory  and 
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computer  time  compared  to  equivalent  uniform-grid  computations  have  been  achieved  in  two-  and 
three-dimensional  test  simulations  presented  in  the  paper.  Three-dimensional  simulations  of  a 
shock  wave  interacting  with  a  spherical  bubble  of  light  gas  have  been  performed  that  shown  the 
development  of  azimuthal  instabilities  at  the  bubble  surface. 

There  are  many  uses  of  FTT  other  than  the  integration  of  the  Euler  equations,  such  as  the 
solution  of  parabolic  and  elliptical  partial  differential  equations,  pattern  recognition,  computer 
graphics,  particle  dynamics,  etc.  An  application  of  the  adaptive  refinement  tree  to  dark  matter 
cosmological  simulations  has  been  recently  described  in  [25].  An  application  of  FTT  to  reaction- 
diffusion  equations  will  be  described  elsewhere. 
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Fig.  1  -  Relationship  between  cells  for  a  one-dimensional,  binary,  fully  threaded  tree.  Each  cell  is 
represented  by  a  horizontal  barred  line.  The  length  of  a  line  corresponds  to  the  geometrical  size 
of  a  cell.  Cell  1  is  the  root  of  the  tree,  representing  the  entire  computational  domain.  Cells  1  and 
2  are  split  cells.  Cells  3,  4  and  5  are  leaves.  Pointers  to  children  and  parents  are  indicated  by 
straight  lines  without  arrows.  Pointers  to  neighbors  are  indicated  by  arrows. 
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Fig.  2  -  Relationship  between  the  cells  and  octs  in  a  fully  threaded  tree.  Pointers  from  octs  to 
cells  and  from  cells  to  octs  are  indicated  by  arrows. 
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Fig.  3  -  An  illustration  of  flux  evaluation  at  two  different  levels  of  the  tree.  Fluxes  at  interfaces 
between  fine  cells  (1-3  and  2-4),  and  fluxes  between  fine  and  coarse  cells  (3-5  and  4-5)  are  evaluated 
twice  as  often  as  fluxes  between  coarse  cells  (5-6).  Four  fluxes  from  the  fine  side  (3-5  and  4-5 
computed  at  two  different  times),  and  one  flux  (5-6)  from  the  coarse  side  contribute  to  changes  of 
the  state  vector  of  cell  5  during  a  coarse  time  step. 
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Fig.  4  -  An  isolated  shock  wave  with  a  Mach  number  M  =  10  in  an  ideal  gas  with  7  =  1.4; 
computed  with  cfl  =  0.7.  Density  is  shown  by  dots  whose  position  corresponds  to  centers  of 
computation  cells.  In  all  panels,  the  shock  is  moving  from  left  to  right.  Top  panel  —  shock  on  a 
uniform  mesh.  Middle  panel  -  shock  after  passing  from  fine  to  coarse  mesh.  Bottom  panel  -  shock 
after  passing  from  coarse  to  fine  mesh.  Coarse-to-fine  interface  is  located  near  x  ~  0.11.  Two 
disturbances  generated  during  the  passage  of  the  coarse-to-fine  interface  are  seen  on  the  bottom 
panel. 
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Fig.  5  -  Advection  of  a  high  density,  cold  slab  of  gas  with  />h igh  =  3  surrounded  by  a  low  density  hot 
gas  with  piow  =  1.  Pressure  is  P  =  0.01,  and  velocity  is  U  =  2  everywhere.  An  ideal  gas  EOS  with 
7  =  1.4;  computed  with  c/7  =  0.7.  Advection  is  taking  place  from  left  to  right.  A  coarse-to-fine 
interface  is  located  at  x  ~  0.38  (dashed  line).  Density  is  shown  by  dots  whose  position  corresponds 
to  centers  of  computation  cells.  Time-step  number  is  indicated  at  the  top  of  each  density  profile. 
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P0  =  10“  ,  initial  density  is  po  =  1,  and  the  explosion  energy  is  t  —  2.b  x  1U  .  t  omputed  using 
cfl  -  0.7  and  eight  levels  of  mesh  refinement.  Plots  of  density,  pressure,  velocity,  and  cell  level 
in  the  tree  are  shown  for  three  different  moments  of  time  (1)  -  1.15  X  10-6,  (2)  -  3.02  x  10-r>. 
and  (3)  -  6.07  x  10-6.  The  variables  are  shown  by  dots  whose  position  corresponds  to  centers  of 
computation  cells.  Solid  lines  -  the  exact  solution. 
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Fig.  8(c)  —  (Continued) 
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Fig.  9(a)  —  Same  as  Fig.  8  but  shows  the  computational  mesh:  (a)  -  step  400  and  (b)  -  step  600. 
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Fig.  9(b)  —  Same  as  Fig.  8  but  shows  the  computational  mesh:  (a)  —  step  400  and  (b)  —  step  600. 


33 


solid  wall 


7 


3 


outflow 


Fig.  10  -  Computational  setting  of  the  shock  wave  -  spherical  bubble  interaction  problem.  Depicts 
a  square  tube  of  size  1  x  j  X  j.  The  computational  domain  is  a  quarter  of  the  tube  cross-section 
shown  by  thick  lines  and  marked  by  the  numbers  1  to  8,  assuming  symmetry  with  respect  to  the 
XZ  plane  y  =  \  and  XY  plane  z  = 
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Fig.  11  -  Shock  wave  -  spherical  bubble  interaction.  Density  contours  A p  =  0.05  and  the  mesh  in 
the  XY  plane  1-2-5-6  (see  Fig.  10)  for  time  steps  20  and  70. 


